Goto

Collaborating Authors

 fidelity and diversity





TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

Neural Information Processing Systems

We propose a robust and reliable evaluation metric for generative models called Topological Precision and Recall (TopP&R, pronounced "topper"), which systematically estimates supports by retaining only topologically and statistically significant features with a certain level of confidence. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and various Precision and Recall (P&R) variants, rely heavily on support estimates derived from sample features. However, the reliability of these estimates has been overlooked, even though the quality of the evaluation hinges entirely on their accuracy. In this paper, we demonstrate that current methods not only fail to accurately assess sample quality when support estimation is unreliable, but also yield inconsistent results. In contrast, TopP&R reliably evaluates the sample quality and ensures statistical consistency in its results. Our theoretical and experimental findings reveal that TopP&R provides a robust evaluation, accurately capturing the true trend of change in samples, even in the presence of outliers and non-independent and identically distributed (Non-IID) perturbations where other methods result in inaccurate support estimations. To our knowledge, TopP&R is the first evaluation metric specifically focused on the robust estimation of supports, offering statistical consistency under noise conditions.


MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

arXiv.org Artificial Intelligence

Abstract--Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior . By jointly training Transformer-and CNN-based variational encoders with an EDM-style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over T abSyn and T abDDPM, highlighting MAGE-ID's effectiveness for multimodal IDS augmentation.


TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

Neural Information Processing Systems

We propose a robust and reliable evaluation metric for generative models called Topological Precision and Recall (TopP&R, pronounced "topper"), which systematically estimates supports by retaining only topologically and statistically significant features with a certain level of confidence. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and various Precision and Recall (P&R) variants, rely heavily on support estimates derived from sample features. However, the reliability of these estimates has been overlooked, even though the quality of the evaluation hinges entirely on their accuracy. In this paper, we demonstrate that current methods not only fail to accurately assess sample quality when support estimation is unreliable, but also yield inconsistent results. In contrast, TopP&R reliably evaluates the sample quality and ensures statistical consistency in its results. Our theoretical and experimental findings reveal that TopP&R provides a robust evaluation, accurately capturing the true trend of change in samples, even in the presence of outliers and non-independent and identically distributed (Non-IID) perturbations where other methods result in inaccurate support estimations.


Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

arXiv.org Artificial Intelligence

We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our approach offers three distinct techniques to adjust the fidelity and diversity of deep generative models: 1) Per-sample perturbation, enabling precise adjustments for individual samples towards either more common or more unique characteristics; 2) Importance sampling during model inference to enhance either fidelity or diversity in the generated data; 3) Fine-tuning with importance sampling, which guides the generative model to learn an adjusted distribution, thus controlling fidelity and diversity. Furthermore, our fine-tuning method demonstrates the ability to improve the Frechet Inception Distance (FID) for pre-trained generative models with minimal iterations.


Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics

arXiv.org Artificial Intelligence

Evaluating generative models is one of the most challenging tasks to achieve (Naeem et al., 2020). This kind of challenge is largely absent in discriminative models, where evaluation primarily involves comparison with ground truth data. However, for generative models, evaluation involves quantifying the validity between real samples and those generated by the model. A common method for evaluating generative models is through human judgment metrics, such as Mean Opinion Scores (MOS) (Streijl et al., 2016). However, this type of evaluation assumes a uniform perception among users regarding what constitutes ideal and realistic generation, which is often not the case. For this reason, generative models require quantitative evaluation based on measures of validity between real and generated samples. This similarity is quantified on two dimensions: fidelity and diversity. On the one hand, fidelity is the measure of similarity between real and generated spaces on the marginal distribution scale. On the other hand, diversity is the measure of how varied a set of samples is, indicating the extent to which the diversity of the generated set in generative models aligns with the diversity of the real set.


PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

arXiv.org Artificial Intelligence

With advancements in generative models, evaluating their performance using rigorous, clearly defined metrics and We propose a comprehensive sample-based criteria has become increasingly essential. Disambiguating method for assessing the quality of generative true from modeled distributions is especially pertinent in models. The proposed approach enables the estimation light of the growing emphasis on AI safety within the community, of the probability that two sets of samples as well as in scientific domains where stringent standards are drawn from the same distribution, providing of rigor and uncertainty quantification are needed for a statistically rigorous method for assessing the the adoption of machine learning methods. When evaluating performance of a single generative model or the generative models, we are interested in three qualitative comparison of multiple competing models trained properties (Stein et al., 2023; Jiralerspong et al., 2023): Fidelity on the same dataset. This comparison can be conducted refers to the quality and realism of individual outputs by dividing the space into non-overlapping generated by a model. It assesses how indistinguishable regions and comparing the number of data samples each generated sample is from real data.


TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

arXiv.org Artificial Intelligence

We propose a robust and reliable evaluation metric for generative models called Topological Precision and Recall (TopP&R, pronounced "topper"), which systematically estimates supports by retaining only topologically and statistically significant features with a certain level of confidence. Existing metrics, such as Inception Score (IS), Fréchet Inception Distance (FID), and various Precision and Recall (P&R) variants, rely heavily on support estimates derived from sample features. However, the reliability of these estimates has been overlooked, even though the quality of the evaluation hinges entirely on their accuracy. In this paper, we demonstrate that current methods not only fail to accurately assess sample quality when support estimation is unreliable, but also yield inconsistent results. In contrast, TopP&R reliably evaluates the sample quality and ensures statistical consistency in its results. Our theoretical and experimental findings reveal that TopP&R provides a robust evaluation, accurately capturing the true trend of change in samples, even in the presence of outliers and non-independent and identically distributed (Non-IID) perturbations where other methods result in inaccurate support estimations. To our knowledge, TopP&R is the first evaluation metric specifically focused on the robust estimation of supports, offering statistical consistency under noise conditions.